Overview

Dataset statistics

Number of variables24
Number of observations29965
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory5.1 MiB
Average record size in memory178.0 B

Variable types

NUM20
CAT2
BOOL2

Reproduction

Analysis started2020-09-08 21:46:25.904039
Analysis finished2020-09-08 21:49:16.310766
Duration2 minutes and 50.41 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

BILL_AMT2 is highly correlated with BILL_AMT1 and 1 other fieldsHigh correlation
BILL_AMT1 is highly correlated with BILL_AMT2High correlation
BILL_AMT3 is highly correlated with BILL_AMT2 and 1 other fieldsHigh correlation
BILL_AMT4 is highly correlated with BILL_AMT3 and 2 other fieldsHigh correlation
BILL_AMT5 is highly correlated with BILL_AMT4 and 1 other fieldsHigh correlation
BILL_AMT6 is highly correlated with BILL_AMT4 and 1 other fieldsHigh correlation
PAY_AMT2 is highly skewed (γ1 = 30.43861292) Skewed
PAY_0 has 14737 (49.2%) zeros Zeros
PAY_2 has 15730 (52.5%) zeros Zeros
PAY_3 has 15764 (52.6%) zeros Zeros
PAY_4 has 16455 (54.9%) zeros Zeros
PAY_5 has 16947 (56.6%) zeros Zeros
PAY_6 has 16286 (54.4%) zeros Zeros
BILL_AMT1 has 1978 (6.6%) zeros Zeros
BILL_AMT2 has 2476 (8.3%) zeros Zeros
BILL_AMT3 has 2840 (9.5%) zeros Zeros
BILL_AMT4 has 3165 (10.6%) zeros Zeros
BILL_AMT5 has 3476 (11.6%) zeros Zeros
BILL_AMT6 has 3990 (13.3%) zeros Zeros
PAY_AMT1 has 5218 (17.4%) zeros Zeros
PAY_AMT2 has 5365 (17.9%) zeros Zeros
PAY_AMT3 has 5937 (19.8%) zeros Zeros
PAY_AMT4 has 6377 (21.3%) zeros Zeros
PAY_AMT5 has 6672 (22.3%) zeros Zeros
PAY_AMT6 has 7142 (23.8%) zeros Zeros

Variables

LIMIT_BAL
Real number (ℝ≥0)

Distinct count81
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean167442.00500584015
Minimum10000
Maximum1000000
Zeros0
Zeros (%)0.0%
Memory size234.1 KiB

Quantile statistics

Minimum10000
5-th percentile20000
Q150000
median140000
Q3240000
95-th percentile430000
Maximum1000000
Range990000
Interquartile range (IQR)190000

Descriptive statistics

Standard deviation129760.1352
Coefficient of variation (CV)0.7749556942
Kurtosis0.5375871217
Mean167442.005
Median Absolute Deviation (MAD)90000
Skewness0.9934913272
Sum5017399680
Variance1.683769269e+10
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
50000336311.2%
 
2000019756.6%
 
3000016105.4%
 
8000015645.2%
 
20000015245.1%
 
15000011073.7%
 
10000010473.5%
 
1800009933.3%
 
3600008742.9%
 
600008252.8%
 
Other values (71)1508350.3%
 
ValueCountFrequency (%) 
100004931.6%
 
160002< 0.1%
 
2000019756.6%
 
3000016105.4%
 
400002300.8%
 
ValueCountFrequency (%) 
10000001< 0.1%
 
8000002< 0.1%
 
7800002< 0.1%
 
7600001< 0.1%
 
7500004< 0.1%
 

MARRIAGE
Categorical

Distinct count4
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size234.1 KiB
2
15945
1
13643
3
 
323
0
 
54
ValueCountFrequency (%) 
21594553.2%
 
11364345.5%
 
33231.1%
 
0540.2%
 

Length

Max length1
Median length1
Mean length1
Min length1

AGE
Real number (ℝ≥0)

Distinct count56
Unique (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.487969297513764
Minimum21
Maximum79
Zeros0
Zeros (%)0.0%
Memory size234.1 KiB

Quantile statistics

Minimum21
5-th percentile23
Q128
median34
Q341
95-th percentile53
Maximum79
Range58
Interquartile range (IQR)13

Descriptive statistics

Standard deviation9.219459233
Coefficient of variation (CV)0.2597911184
Kurtosis0.04398801494
Mean35.4879693
Median Absolute Deviation (MAD)6
Skewness0.7320560019
Sum1063397
Variance84.99842855
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2916025.3%
 
2714754.9%
 
2814064.7%
 
3013944.7%
 
2612524.2%
 
3112134.0%
 
2511854.0%
 
3411613.9%
 
3211573.9%
 
3311463.8%
 
Other values (46)1697456.6%
 
ValueCountFrequency (%) 
21670.2%
 
225601.9%
 
239303.1%
 
2411263.8%
 
2511854.0%
 
ValueCountFrequency (%) 
791< 0.1%
 
753< 0.1%
 
741< 0.1%
 
734< 0.1%
 
723< 0.1%
 

PAY_0
Real number (ℝ)

ZEROS

Distinct count11
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.016752878358084432
Minimum-2
Maximum8
Zeros14737
Zeros (%)49.2%
Memory size234.1 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.123492034
Coefficient of variation (CV)-67.06262707
Kurtosis2.730038381
Mean-0.01675287836
Median Absolute Deviation (MAD)1
Skewness0.7346064765
Sum-502
Variance1.26223435
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
01473749.2%
 
-1568219.0%
 
1366712.2%
 
-227509.2%
 
226668.9%
 
33221.1%
 
4760.3%
 
5260.1%
 
8190.1%
 
611< 0.1%
 
ValueCountFrequency (%) 
-227509.2%
 
-1568219.0%
 
01473749.2%
 
1366712.2%
 
226668.9%
 
ValueCountFrequency (%) 
8190.1%
 
79< 0.1%
 
611< 0.1%
 
5260.1%
 
4760.3%
 

PAY_2
Real number (ℝ)

ZEROS

Distinct count11
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.13185382946771232
Minimum-2
Maximum8
Zeros15730
Zeros (%)52.5%
Memory size234.1 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.196321699
Coefficient of variation (CV)-9.073090281
Kurtosis1.577608705
Mean-0.1318538295
Median Absolute Deviation (MAD)0
Skewness0.7920704147
Sum-3951
Variance1.431185607
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
01573052.5%
 
-1604620.2%
 
2392613.1%
 
-2375212.5%
 
33261.1%
 
4990.3%
 
1280.1%
 
5250.1%
 
7200.1%
 
612< 0.1%
 
ValueCountFrequency (%) 
-2375212.5%
 
-1604620.2%
 
01573052.5%
 
1280.1%
 
2392613.1%
 
ValueCountFrequency (%) 
81< 0.1%
 
7200.1%
 
612< 0.1%
 
5250.1%
 
4990.3%
 

PAY_3
Real number (ℝ)

ZEROS

Distinct count11
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.16439179042215918
Minimum-2
Maximum8
Zeros15764
Zeros (%)52.6%
Memory size234.1 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.195877509
Coefficient of variation (CV)-7.274557358
Kurtosis2.091665951
Mean-0.1643917904
Median Absolute Deviation (MAD)0
Skewness0.8414639808
Sum-4926
Variance1.430123016
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
01576452.6%
 
-1593419.8%
 
-2405513.5%
 
2381912.7%
 
32400.8%
 
4750.3%
 
7270.1%
 
6230.1%
 
5210.1%
 
14< 0.1%
 
ValueCountFrequency (%) 
-2405513.5%
 
-1593419.8%
 
01576452.6%
 
14< 0.1%
 
2381912.7%
 
ValueCountFrequency (%) 
83< 0.1%
 
7270.1%
 
6230.1%
 
5210.1%
 
4750.3%
 

PAY_4
Real number (ℝ)

ZEROS

Distinct count11
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.21892207575504755
Minimum-2
Maximum8
Zeros16455
Zeros (%)54.9%
Memory size234.1 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.168175186
Coefficient of variation (CV)-5.33603193
Kurtosis3.508962108
Mean-0.2189220758
Median Absolute Deviation (MAD)0
Skewness1.000798562
Sum-6560
Variance1.364633266
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
01645554.9%
 
-1568319.0%
 
-2431814.4%
 
2315910.5%
 
31800.6%
 
4680.2%
 
7580.2%
 
5350.1%
 
65< 0.1%
 
82< 0.1%
 
ValueCountFrequency (%) 
-2431814.4%
 
-1568319.0%
 
01645554.9%
 
12< 0.1%
 
2315910.5%
 
ValueCountFrequency (%) 
82< 0.1%
 
7580.2%
 
65< 0.1%
 
5350.1%
 
4680.2%
 

PAY_5
Real number (ℝ)

ZEROS

Distinct count10
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.26450859335891874
Minimum-2
Maximum8
Zeros16947
Zeros (%)56.6%
Memory size234.1 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.132219856
Coefficient of variation (CV)-4.280465302
Kurtosis4.003562263
Mean-0.2645085934
Median Absolute Deviation (MAD)0
Skewness1.009329021
Sum-7926
Variance1.281921802
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
01694756.6%
 
-1553518.5%
 
-2451615.1%
 
226268.8%
 
31780.6%
 
4830.3%
 
7580.2%
 
5170.1%
 
64< 0.1%
 
81< 0.1%
 
ValueCountFrequency (%) 
-2451615.1%
 
-1553518.5%
 
01694756.6%
 
226268.8%
 
31780.6%
 
ValueCountFrequency (%) 
81< 0.1%
 
7580.2%
 
64< 0.1%
 
5170.1%
 
4830.3%
 

PAY_6
Real number (ℝ)

ZEROS

Distinct count10
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.2894376772901719
Minimum-2
Maximum8
Zeros16286
Zeros (%)54.4%
Memory size234.1 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.1490901
Coefficient of variation (CV)-3.970077809
Kurtosis3.437256875
Mean-0.2894376773
Median Absolute Deviation (MAD)0
Skewness0.9486089933
Sum-8673
Variance1.320408057
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
01628654.4%
 
-1573619.1%
 
-2486516.2%
 
227669.2%
 
31840.6%
 
4480.2%
 
7460.2%
 
6190.1%
 
513< 0.1%
 
82< 0.1%
 
ValueCountFrequency (%) 
-2486516.2%
 
-1573619.1%
 
01628654.4%
 
227669.2%
 
31840.6%
 
ValueCountFrequency (%) 
82< 0.1%
 
7460.2%
 
6190.1%
 
513< 0.1%
 
4480.2%
 

BILL_AMT1
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct count22723
Unique (%)75.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean51283.00977807442
Minimum-165580
Maximum964511
Zeros1978
Zeros (%)6.6%
Memory size234.1 KiB

Quantile statistics

Minimum-165580
5-th percentile0
Q13595
median22438
Q367260
95-th percentile201303.8
Maximum964511
Range1130091
Interquartile range (IQR)63665

Descriptive statistics

Standard deviation73658.1324
Coefficient of variation (CV)1.436306736
Kurtosis9.796846218
Mean51283.00978
Median Absolute Deviation (MAD)21842
Skewness2.662513456
Sum1536695388
Variance5425520469
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
019786.6%
 
3902430.8%
 
780760.3%
 
326720.2%
 
316630.2%
 
2500590.2%
 
396480.2%
 
2400390.1%
 
416290.1%
 
1050250.1%
 
Other values (22713)2733391.2%
 
ValueCountFrequency (%) 
-1655801< 0.1%
 
-1549731< 0.1%
 
-153081< 0.1%
 
-143861< 0.1%
 
-115451< 0.1%
 
ValueCountFrequency (%) 
9645111< 0.1%
 
7468141< 0.1%
 
6530621< 0.1%
 
6304581< 0.1%
 
6266481< 0.1%
 

BILL_AMT2
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct count22346
Unique (%)74.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean49236.36629400968
Minimum-69777
Maximum983931
Zeros2476
Zeros (%)8.3%
Memory size234.1 KiB

Quantile statistics

Minimum-69777
5-th percentile0
Q13010
median21295
Q364109
95-th percentile194889.6
Maximum983931
Range1053708
Interquartile range (IQR)61099

Descriptive statistics

Standard deviation71195.56739
Coefficient of variation (CV)1.445995567
Kurtosis10.29321199
Mean49236.36629
Median Absolute Deviation (MAD)20905
Skewness2.70386174
Sum1475367716
Variance5068808816
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
024768.3%
 
3902300.8%
 
326750.3%
 
780750.3%
 
316720.2%
 
2500510.2%
 
396500.2%
 
2400420.1%
 
-200290.1%
 
416280.1%
 
Other values (22336)2683789.6%
 
ValueCountFrequency (%) 
-697771< 0.1%
 
-675261< 0.1%
 
-333501< 0.1%
 
-300001< 0.1%
 
-262141< 0.1%
 
ValueCountFrequency (%) 
9839311< 0.1%
 
7439701< 0.1%
 
6715631< 0.1%
 
6467701< 0.1%
 
6244751< 0.1%
 

BILL_AMT3
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct count22026
Unique (%)73.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean47067.91606874687
Minimum-157264
Maximum1664089
Zeros2840
Zeros (%)9.5%
Memory size234.1 KiB

Quantile statistics

Minimum-157264
5-th percentile0
Q12711
median20135
Q360201
95-th percentile187901
Maximum1664089
Range1821353
Interquartile range (IQR)57490

Descriptive statistics

Standard deviation69371.35232
Coefficient of variation (CV)1.473856464
Kurtosis19.77100256
Mean47067.91607
Median Absolute Deviation (MAD)19745
Skewness3.086493832
Sum1410390105
Variance4812384523
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
028409.5%
 
3902740.9%
 
780740.2%
 
326630.2%
 
316620.2%
 
396470.2%
 
2500400.1%
 
2400390.1%
 
416290.1%
 
200270.1%
 
Other values (22016)2647088.3%
 
ValueCountFrequency (%) 
-1572641< 0.1%
 
-615061< 0.1%
 
-461271< 0.1%
 
-340411< 0.1%
 
-254431< 0.1%
 
ValueCountFrequency (%) 
16640891< 0.1%
 
8550861< 0.1%
 
6931311< 0.1%
 
6896431< 0.1%
 
6896271< 0.1%
 

BILL_AMT4
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct count21548
Unique (%)71.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean43313.32988486568
Minimum-170000
Maximum891586
Zeros3165
Zeros (%)10.6%
Memory size234.1 KiB

Quantile statistics

Minimum-170000
5-th percentile0
Q12360
median19081
Q354601
95-th percentile174469.8
Maximum891586
Range1061586
Interquartile range (IQR)52241

Descriptive statistics

Standard deviation64353.51437
Coefficient of variation (CV)1.485766958
Kurtosis11.29858229
Mean43313.32988
Median Absolute Deviation (MAD)18681
Skewness2.820544832
Sum1297883930
Variance4141374812
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0316510.6%
 
3902450.8%
 
7801010.3%
 
316680.2%
 
326620.2%
 
396430.1%
 
150390.1%
 
2400390.1%
 
2500340.1%
 
1000330.1%
 
Other values (21538)2613687.2%
 
ValueCountFrequency (%) 
-1700001< 0.1%
 
-813341< 0.1%
 
-651671< 0.1%
 
-506161< 0.1%
 
-466271< 0.1%
 
ValueCountFrequency (%) 
8915861< 0.1%
 
7068641< 0.1%
 
6286991< 0.1%
 
6168361< 0.1%
 
5728051< 0.1%
 

BILL_AMT5
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct count21010
Unique (%)70.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean40358.33439012181
Minimum-81334
Maximum927171
Zeros3476
Zeros (%)11.6%
Memory size234.1 KiB

Quantile statistics

Minimum-81334
5-th percentile0
Q11787
median18130
Q350247
95-th percentile165805.6
Maximum927171
Range1008505
Interquartile range (IQR)48460

Descriptive statistics

Standard deviation60817.13062
Coefficient of variation (CV)1.506928657
Kurtosis12.29453891
Mean40358.33439
Median Absolute Deviation (MAD)17714
Skewness2.874925049
Sum1209337490
Variance3698723377
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0347611.6%
 
3902340.8%
 
780940.3%
 
316790.3%
 
326620.2%
 
150580.2%
 
396460.2%
 
2400390.1%
 
2500370.1%
 
416360.1%
 
Other values (21000)2580486.1%
 
ValueCountFrequency (%) 
-813341< 0.1%
 
-613721< 0.1%
 
-530071< 0.1%
 
-466271< 0.1%
 
-375941< 0.1%
 
ValueCountFrequency (%) 
9271711< 0.1%
 
8235401< 0.1%
 
5870671< 0.1%
 
5517021< 0.1%
 
5478801< 0.1%
 

BILL_AMT6
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct count20604
Unique (%)68.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38917.012280994495
Minimum-339603
Maximum961664
Zeros3990
Zeros (%)13.3%
Memory size234.1 KiB

Quantile statistics

Minimum-339603
5-th percentile0
Q11262
median17124
Q349252
95-th percentile161932
Maximum961664
Range1301267
Interquartile range (IQR)47990

Descriptive statistics

Standard deviation59574.14774
Coefficient of variation (CV)1.530799623
Kurtosis12.25912611
Mean38917.01228
Median Absolute Deviation (MAD)16808
Skewness2.845137169
Sum1166148273
Variance3549079079
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0399013.3%
 
3902060.7%
 
780860.3%
 
150780.3%
 
316770.3%
 
326560.2%
 
396440.1%
 
416360.1%
 
-18330.1%
 
2400320.1%
 
Other values (20594)2532784.5%
 
ValueCountFrequency (%) 
-3396031< 0.1%
 
-2090511< 0.1%
 
-1509531< 0.1%
 
-946251< 0.1%
 
-738951< 0.1%
 
ValueCountFrequency (%) 
9616641< 0.1%
 
6999441< 0.1%
 
5686381< 0.1%
 
5277111< 0.1%
 
5275661< 0.1%
 

PAY_AMT1
Real number (ℝ≥0)

ZEROS

Distinct count7943
Unique (%)26.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5670.099315868513
Minimum0
Maximum873552
Zeros5218
Zeros (%)17.4%
Memory size234.1 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11000
median2102
Q35008
95-th percentile18447.2
Maximum873552
Range873552
Interquartile range (IQR)4008

Descriptive statistics

Standard deviation16571.84947
Coefficient of variation (CV)2.92267358
Kurtosis414.8548633
Mean5670.099316
Median Absolute Deviation (MAD)1929
Skewness14.66159454
Sum169904526
Variance274626194.7
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0521817.4%
 
200013634.5%
 
30008913.0%
 
50006982.3%
 
15005071.7%
 
40004261.4%
 
100004011.3%
 
10003651.2%
 
25002981.0%
 
60002941.0%
 
Other values (7933)1950465.1%
 
ValueCountFrequency (%) 
0521817.4%
 
19< 0.1%
 
214< 0.1%
 
3150.1%
 
4180.1%
 
ValueCountFrequency (%) 
8735521< 0.1%
 
5050001< 0.1%
 
4933581< 0.1%
 
4239031< 0.1%
 
4050161< 0.1%
 

PAY_AMT2
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct count7899
Unique (%)26.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5927.983180377107
Minimum0
Maximum1684259
Zeros5365
Zeros (%)17.9%
Memory size234.1 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1850
median2010
Q35000
95-th percentile19030.8
Maximum1684259
Range1684259
Interquartile range (IQR)4150

Descriptive statistics

Standard deviation23053.45664
Coefficient of variation (CV)3.888920724
Kurtosis1639.924451
Mean5927.98318
Median Absolute Deviation (MAD)1990
Skewness30.43861292
Sum177632016
Variance531461863.3
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0536517.9%
 
200012904.3%
 
30008572.9%
 
50007172.4%
 
10005942.0%
 
15005211.7%
 
40004101.4%
 
100003181.1%
 
60002830.9%
 
25002510.8%
 
Other values (7889)1935964.6%
 
ValueCountFrequency (%) 
0536517.9%
 
1150.1%
 
2200.1%
 
3180.1%
 
411< 0.1%
 
ValueCountFrequency (%) 
16842591< 0.1%
 
12270821< 0.1%
 
12154711< 0.1%
 
10245161< 0.1%
 
5804641< 0.1%
 

PAY_AMT3
Real number (ℝ≥0)

ZEROS

Distinct count7518
Unique (%)25.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5231.688836976473
Minimum0
Maximum896040
Zeros5937
Zeros (%)19.8%
Memory size234.1 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1390
median1804
Q34512
95-th percentile17602.6
Maximum896040
Range896040
Interquartile range (IQR)4122

Descriptive statistics

Standard deviation17616.36112
Coefficient of variation (CV)3.367241759
Kurtosis563.7392771
Mean5231.688837
Median Absolute Deviation (MAD)1796
Skewness17.2081766
Sum156767556
Variance310336179.3
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0593719.8%
 
200012854.3%
 
100011033.7%
 
30008702.9%
 
50007212.4%
 
15004901.6%
 
40003811.3%
 
100003121.0%
 
12002430.8%
 
60002410.8%
 
Other values (7508)1838261.3%
 
ValueCountFrequency (%) 
0593719.8%
 
113< 0.1%
 
2190.1%
 
314< 0.1%
 
4150.1%
 
ValueCountFrequency (%) 
8960401< 0.1%
 
8890431< 0.1%
 
5082291< 0.1%
 
4175881< 0.1%
 
4009721< 0.1%
 

PAY_AMT4
Real number (ℝ≥0)

ZEROS

Distinct count6937
Unique (%)23.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4831.617453695979
Minimum0
Maximum621000
Zeros6377
Zeros (%)21.3%
Memory size234.1 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1300
median1500
Q34016
95-th percentile16037
Maximum621000
Range621000
Interquartile range (IQR)3716

Descriptive statistics

Standard deviation15674.46454
Coefficient of variation (CV)3.244144365
Kurtosis277.0486932
Mean4831.617454
Median Absolute Deviation (MAD)1500
Skewness12.89850649
Sum144779417
Variance245688838.5
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0637721.3%
 
100013944.7%
 
200012144.1%
 
30008873.0%
 
50008102.7%
 
15004411.5%
 
40004021.3%
 
100003411.1%
 
25002590.9%
 
5002580.9%
 
Other values (6927)1758258.7%
 
ValueCountFrequency (%) 
0637721.3%
 
1220.1%
 
2220.1%
 
313< 0.1%
 
4200.1%
 
ValueCountFrequency (%) 
6210001< 0.1%
 
5288971< 0.1%
 
4970001< 0.1%
 
4321301< 0.1%
 
4000461< 0.1%
 

PAY_AMT5
Real number (ℝ≥0)

ZEROS

Distinct count6897
Unique (%)23.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4804.897046554313
Minimum0
Maximum426529
Zeros6672
Zeros (%)22.3%
Memory size234.1 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1261
median1500
Q34042
95-th percentile16000
Maximum426529
Range426529
Interquartile range (IQR)3781

Descriptive statistics

Standard deviation15286.3723
Coefficient of variation (CV)3.181415158
Kurtosis179.8752095
Mean4804.897047
Median Absolute Deviation (MAD)1500
Skewness11.12174174
Sum143978740
Variance233673178
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0667222.3%
 
100013404.5%
 
200013234.4%
 
30009473.2%
 
50008142.7%
 
15004261.4%
 
40004011.3%
 
100003431.1%
 
5002500.8%
 
60002470.8%
 
Other values (6887)1720257.4%
 
ValueCountFrequency (%) 
0667222.3%
 
1210.1%
 
213< 0.1%
 
313< 0.1%
 
412< 0.1%
 
ValueCountFrequency (%) 
4265291< 0.1%
 
4179901< 0.1%
 
3880711< 0.1%
 
3792671< 0.1%
 
3320001< 0.1%
 

PAY_AMT6
Real number (ℝ≥0)

ZEROS

Distinct count6939
Unique (%)23.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5221.498014350075
Minimum0
Maximum528666
Zeros7142
Zeros (%)23.8%
Memory size234.1 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1131
median1500
Q34000
95-th percentile17384.4
Maximum528666
Range528666
Interquartile range (IQR)3869

Descriptive statistics

Standard deviation17786.97686
Coefficient of variation (CV)3.406489252
Kurtosis166.9817897
Mean5221.498014
Median Absolute Deviation (MAD)1500
Skewness10.63509397
Sum156462188
Variance316376546
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0714223.8%
 
100012994.3%
 
200012954.3%
 
30009143.1%
 
50008082.7%
 
15004391.5%
 
40004111.4%
 
100003561.2%
 
5002470.8%
 
60002200.7%
 
Other values (6929)1683456.2%
 
ValueCountFrequency (%) 
0714223.8%
 
1200.1%
 
29< 0.1%
 
314< 0.1%
 
412< 0.1%
 
ValueCountFrequency (%) 
5286661< 0.1%
 
5271431< 0.1%
 
4430011< 0.1%
 
4220001< 0.1%
 
4035001< 0.1%
 

SEX_male
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size29.3 KiB
0
18091
1
11874
ValueCountFrequency (%) 
01809160.4%
 
11187439.6%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size29.3 KiB
0
23335
1
6630
ValueCountFrequency (%) 
02333577.9%
 
1663022.1%
 

EnEdu
Categorical

Distinct count4
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size234.1 KiB
3
14019
4
10563
2
4915
1
 
468
ValueCountFrequency (%) 
31401946.8%
 
41056335.3%
 
2491516.4%
 
14681.6%
 

Length

Max length1
Median length1
Mean length1
Min length1

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

LIMIT_BALMARRIAGEAGEPAY_0PAY_2PAY_3PAY_4PAY_5PAY_6BILL_AMT1BILL_AMT2BILL_AMT3BILL_AMT4BILL_AMT5BILL_AMT6PAY_AMT1PAY_AMT2PAY_AMT3PAY_AMT4PAY_AMT5PAY_AMT6SEX_maledefault payment next month_defaultEnEdu
02000012422-1-1-2-23913310268900006890000013
1120000226-120002268217252682327234553261010001000100002000013
290000234000000292391402713559143311494815549151815001000100010005000003
350000137000000469904823349291283142895929547200020191200110010691000003
450000157-10-10008617567035835209401914619131200036681100009000689679103
5500002370000006440057069576081939419619200242500181565710001000800104
6500000229000000367965412023445007542653483003473944550004000038000202391375013770104
71000002230-1-100-111876380601221-159567380601058116871542003
81400001280020001128514096121081221111793371933290432100010001000002
920000235-2-2-2-2-1-1000013007139120001300711220102

Last rows

LIMIT_BALMARRIAGEAGEPAY_0PAY_2PAY_3PAY_4PAY_5PAY_6BILL_AMT1BILL_AMT2BILL_AMT3BILL_AMT4BILL_AMT5BILL_AMT6PAY_AMT1PAY_AMT2PAY_AMT3PAY_AMT4PAY_AMT5PAY_AMT6SEX_maledefault payment next month_defaultEnEdu
299551400001410000001383251371421391101382624967546121600070004228150520002000103
29956210000134322222250025002500250025002500000000113
2995710000143000-2-2-28802104000000200000000102
299581000002380-1-10003042142710299670626694735500420001117844000300020002000104
2995980000234222222725577770879384775198260781158700035000700004000113
299602200001390000001889481928152083658800431237159808500200005003304750001000102
29961150000243-1-1-1-10016831828350289795190018373526899812900102
2996230000237432-1003565335627582087820582193570022000420020003100113
29963800001411-1000-1-1645783797630452774118554894485900340911781926529641804112
2996450000146000000479294890549764365353242815313207818001430100010001000113